Computed Tomography slice interpolation in the longitudinal direction based on deep learning techniques: To reduce slice thickness or slice increment without dose increase

Large slice thickness or slice increment causes information insufficiency of Computed Tomography (CT) data in the longitudinal direction, which degrades the quality of CT-based diagnosis. Traditional approaches such as high-resolution computed tomography (HRCT) and linear interpolation can solve this problem. However, HRCT suffers from dose increase, and linear interpolation causes artifacts. In this study, we propose a deep-learning-based approach to reconstruct densely sliced CT from sparsely sliced CT data without any dose increase. The proposed method reconstructs CT images from neighboring slices using a U-net architecture. To prevent multiple reconstructed slices from influencing one another, we propose a parallel architecture in which multiple U-net architectures work independently. Moreover, for a specific organ (i.e., the liver), we propose a range-clip technique to improve reconstruction quality, which enhances the learning of CT values within this organ by enlarging the range of the training data. CT data from 130 patients were collected, with 80% used for training and the remaining 20% used for testing. Experiments showed that our parallel U-net architecture reduced the mean absolute error of CT values in the reconstructed slices by 22.05%, and also reduced the incidence of artifacts around the boundaries of target organs, compared with linear interpolation. Further improvements of 15.12%, 11.04%, 10.94%, and 10.63% were achieved for the liver, left kidney, right kidney, and stomach, respectively, using the proposed range-clip algorithm. Also, we compared the proposed architecture with original U-net method, and the experimental results demonstrated the superiority of our approach.


Introduction
performance. This combination achieved densely sliced CT reconstruction in the longitudinal direction without changing any CT device parameters such as slice thickness or pitch. Fig 1 shows the proposed CT slice reconstruction process, which has two contributions: (1) to reconstruct densely sliced CT to improve the resolution in the longitudinal direction without any dose increase, and (2) to interpolate middle slices to fill the gap between two adjacent CT slices in the case that slice increment is larger than slice thickness.

Materials and methods
In this section, we first describe the proposed parallel U-net architecture for CT slice reconstruction in the longitudinal direction, and then explains the range-clip technique for improving the reconstruction quality of each specified organ.

Parallel U-net architecture
Suppose we reconstruct 2m − 1 middle slices from pairs of adjacent slices. Let sequence φ represent the target dense CT slices after reconstruction, i.e., φ = {S 1 , S 2 , . . ., S t−m , S t−(m−1) , . . ., S t , . . ., S t+m−1 , S t+m , . . ., S n } where t is the index of the sorted slices and n is the total number of dense CT slices. The proposed approach is to reconstruct {S t−(m−1) , . . ., S t , . . ., S t+m−1 } from S t−m and S t+m to achieve the densely sliced CT reconstruction in the longitudinal direction. The architecture of the proposed reconstruction algorithm is shown in Fig 2. To prevent the different reconstructed slices from influencing one another, we adopt a parallel architecture with multiple U-net architectures. These U-net architectures are each used to reconstruct its own target CT slice, using the same input slices. The image size of both the input and output slices is 400 × 320 pixels. Each U-net architecture contains eight encoder and seven decoder modules, respectively, as shown in Fig 3. A dropout layer is added in the last layer of decoder module to reduce the influence of over-fitting.
We also implemented a learning technique that uses one U-net architecture to reconstruct multiple slices. However, its accuracy was lower than that of the proposed parallel architecture (comparison shown in the Results section). Compared with the original U-net method, the proposed parallel approach learns the nonlinear relationships among multiple neighborhood slices using multiple sets of parameters. As we mentioned above, the proposed architecture can be used either to interpolate the middle slides to fill the gap, or to reconstruct slices to reduce the slice thickness.

Organ-oriented reconstruction
In this section, we discuss slice reconstruction for a specific organ, which we name organoriented reconstruction. The range of CT values across all human organs is wide. However, for a certain organ, e.g. the liver, the range can be much narrower. It is possible to improve the reconstruction quality of a certain organ if we focus on the range within it and then enlarge this range. In this study, we used a CT dataset in which the ROI of each organ had been labeled by board-certified radiation oncologists. Based on these labeled data, we obtained an approximate distribution of the CT values for each organ and then assessed the organ's range. Architecture of the proposed approach for CT slice reconstruction. The neural networks deduces multiple middle slices from two neighboring slices. A parallel architecture is adopted to prevent the different target slices from influencing one another, so that the target slices are computed separately.
https://doi.org/10.1371/journal.pone.0279005.g002  4 shows the CT value range of the liver compared with that of the whole slice in the training dataset. There may be foreign objects such as drainage tubes or surgical staples inside the body, introducing noise to the distribution. Therefore, we manually checked the distribution for each organ. If noise was present, we removed it with filters manually. The range of all CT values across the training data was normalized to [0, 1], and then the range [R ll , R lh ] of CT values of pixels inside the liver was computed based on the labeled training data (0 � R ll � 1, 0 � R lh � 1, and R ll � R lh ). Finally, a linear function f(x) was established to change the CT value for each pixel x, enlarging the range of the liver from [R ll , R lh ] to [0,1]. The transformation function f(x) is defined as In (1), C and B are constants that ensure that f(x) falls within the range of [0, 1] for liver pixels. C and B are exclusively based on the training data, and their values change when the range-clip is applied to other organs. Because the range of the liver is enlarged, the quality of this region was improved after the range-clip processing. Although other regions outside the liver may lose information when the above range-clip processing is performed, this did not influence the final results because we also trained a model based on the original data to save the information about non-liver regions. In other words, we trained the networks for reconstructing the liver and non-liver regions using the range-clipped and original data, respectively. Finally, we merged the outputs of both trained networks. To maintain consistency during merging, we applied f to the test data before inputting them into the trained networks and f −1 to the network's output.
Similar to that for the liver, we can implement the proposed range-clip-based learning method for other organs (e.g., lungs, kidneys, stomach). The designed training process is shown in Fig 5. The range-clip technique aims to enhance learning within a specified organ by exploiting the range features of the training data. It provides a chance to improve the reconstruction quality by labeling the training data rather than the test data. The air contained in the stomach occasionally influences the reconstruction quality. Therefore, we used filters to exclude air areas inside the stomach before the organ-oriented CT reconstruction was conducted.
In summary, The organ-oriented CT reconstruction is a combination of parallel U-net architecture and range-clip technique, improving the learning of pixels within a specific organ by exploiting its range feature.

Database detail
Our database contains 130 CT data from 130 patients. The CT data had a resolution of 1.0742 × 1.0742 × 2.5mm 3 , and their range was clipped to [−1000, 1000] HU (Hounsfield units) for observing soft organs. Areas outside this range such as bones were not considered in this study. For each CT series, labels indicating the liver, left kidney, right kidney, and stomach regions were provided by board-certified radiation oncologists.
The proposed study is retrospective, and the patient data were obtained from Kyoto University Hospital, Kyoto, Japan. There was no specific inclusion/exclusion criteria for the  participants. Patients participated voluntarily. Before providing their data, a written consent form which included the introduction of the research, the description of the data detail, and the agreement statement was obtained from each participant. In addition, we allowed opt out for all participants.
This study followed all dictates of the Declaration of Helsinki and the Ethics Review Board of Kyoto University Hospital, and the Faculty of Medicine approved the research (approval number R1446).

Results
In this section, first we describe the experimental settings, and then introduce the evaluation of the proposed parallel U-net architecture, and finally show the evaluation of the proposed organ-oriented CT reconstruction.

Experimental settings
In this subsection, we describe the criteria used for performance evaluation, the experimental parameters, and the properties of the data. In radiotherapy, errors in both CT values and organ layouts influence the accuracy of dose calculation. Thus, we evaluated the proposed approaches based on both the observed CT values and organ appearance. We used MAE to evaluate the performance of CT value reconstruction and SSIM (structural similarity [26,27]) to evaluate the appearance of the reconstructed slices.
In experiments, we evaluated the models' performance at reconstructing one, two, three, four, and five middle slices from each pair of adjacent slices (m = {1, 1.5, 2, 2.5, 3}, so that 2m − 1 = {1, 2, 3, 4, 5}). How to decide the m value will be explained in the Discussion section. The values of C and B in (1) for different organs are shown in Table 3, which were decided by labeled training data.
Using the proposed method, we can reconstruct multiple middle slices from any pair of adjacent slices. However, after reconstruction, the ground truth data need to correspond with the input data if we intend to evaluate the proposed algorithm. To generate the ground truth data, we used the CT data with all slices as the densely sliced CT series and created sparse versions for performance evaluation. Suppose that m = 3, which means that five slices are reconstructed from two adjacent slices. We separate each original CT into two groups: φ 1 = {S 1 , S 7 , S 13 , . . ., S n−12 , S n−6 , S n } and φ 2 = {S 2 , S 3 , S 4 , S 5 , S 6 , . . ., S 8 , S 9 , S 10 , S 11 , S 12 , . . ., S n−5 , S n−4 , S n−3 , S n−2 , S n−1 }, where φ 1 is the sparse version and φ 2 is the reconstructed target slices. Finally, the densely sliced CT series can be reconstructed by merging φ 1 and φ 2 . For other values of m, we can regroup the data for the same evaluation.
For training, φ 1 and φ 2 were used as the input and output of the proposed parallel U-net architecture, respectively. For testing, φ 1 was used as the input to the neural network, and φ 2 was used as the ground truth for evaluation. By comparing φ 2 with the output of the trained network in test mode, we can evaluate the proposed approaches' reconstruction performance. For more precise evaluation (e.g., to evaluate the reconstruction of nonexistent CT data with slice thickness of less than 0.5 mm), clinical measurements by professional doctors are required (see details in the Discussion section). Note that we used 80% (104 cases) of the data for training, and the remaining 20% (26 cases) for testing in this study.
In this research, we used multiple U-nets architecture, which requires more GPU resources than the single U-net one in the training phase. In our experiments, we used GeForce RTX 2080 whose memory is 11GB for training each U-net. We trained the multiple U-nets parallel using multiple GPUs simultaneously, and it took about one hour. However, in the test, we only used one GPU.

Evaluation of the proposed parallel U-net architecture-based CT slice reconstruction
First, we evaluated the performance of the proposed parallel U-net architecture at reconstructing one, two, three, four, and five (m = {1, 1.5, 2, 2.5, 3}) middle slices from pairs of adjacent slices. Table 1 summaries the results: we found that m = 1 had the lowest MAE, whereas m = 3 had the highest MAE. In the following experiments, we use m = 3 as an example to compare the proposed approach with other methods because m = 3 is the most challenging setting. The MAE with m = 2 was slightly smaller than that with m = 1.5. We believe that the training of m = 1.5 may have been influenced by the bias of the training data. However, MAE tended to increase as the m value increased.
Second, we compared the proposed parallel U-net architecture with conventional linear interpolation and the other method using one U-net architecture [2] to produce multiple slices when m = 3. Table 2 presents a comparison between all three methods' results for all reconstructed CT slices. The proposed approach reduces the MAE by 22.05% and 6.09% compared with linear interpolation and the other U-net method [2], respectively. Fig 6 shows the significant differences between the MAE values given by the proposed approach, linear interpolation, and the other U-net method [2] (all p < 0.01). The results demonstrate that a significant improvement is achieved by the proposed parallel U-net architecture compared with both linear interpolation and the other U-net method [2]. Compared with the case that uses one U-net architecture to reconstruct multiple slices, the proposed parallel strategy is better because learning parameters to fit one slice is easier than to fit multiple slices.

Evaluation of the proposed organ-oriented CT slice reconstruction
We also evaluated the proposed parallel U-net architecture combined with the proposed range-clip technique. Table 3 summarizes the MAE results for the liver, left kidney, right kidney, and stomach, and Table 4 presents the SSIM results for the same organs. To highlight the differences between the computed results and the ground truth, we implemented data normalization before calculating the SSIM values for each slice. Tables 3 and 4 show that the proposed parallel U-net architecture combined with the proposed range-clip technique achieved the best results. Figs 7-10 show the significant differences of linear interpolation, the proposed parallel U-net architecture, and the proposed organ-oriented method for the reconstruction of liver, left kidney, right kidney, and stomach regions, respectively. From these figures, we can conclude that the proposed parallel U-net architecture significantly outperforms the linear  interpolation. Then, the proposed organ-oriented method furthermore improves the reconstruction quality for all the specified organs. A comparison of the slices reconstructed by the proposed methods with the ground truth shows that some details are lost. This is because we reconstructed CT slices that originally  did not exist at all, leading to rarity of information to facilitate detail reconstruction. However, the slices reconstructed by the proposed methods are of higher quality than those created by linear interpolation. Although the results of linear interpolation contain many details, they are artifacts, which caused the blur of organ boundaries, and consequently  leading to incorrect dose calculation. In contrast, the proposed parallel U-net architecture reduces these artifacts, and the proposed organ-oriented method achieves a further improvement. The combination of the two proposed methods achieves more correct boundaries than linear interpolation, which is important to dose calculation in radiotherapy surgeries. When we compare the difference color-maps, we find that the proposed method also improves the reconstruction quality inside the organs. Although obtaining slice reconstruction results that exactly match the ground truth is impossible, the proposed method reduces the artifacts around each organ more effectively than linear interpolation does. Rows 3-6 in Fig 12 shows that the proposed method can reconstruct the shape of organs in situations where linear interpolation fails.

Discussion
In this section, we discuss several issues related to the proposed study.

Limitations of the proposed approach
The proposed method cannot be evaluated for reconstructing CT data with slice thickness of less than 0.5 mm because we do not have ground truth data for comparison. However, evaluations based on clinical measurements during surgery are possible. In future work, we plan to invite professional doctors to evaluate the proposed method in clinical situations. Another limitation is that some details inside the organs are lost in the reconstructed CT slices. This is because reconstructing data which do originally not exist is difficult. However, the proposed approach could provide more precise organ boundaries compared with other methods, which can still contribute to dose calculation.

How to decide the value of m in Fig 1
The larger the m value is, the more middle slices are reconstructed, and the more densely sliced in the longitude direction. Nevertheless, noise also increases as the spatial resolution of CT data is improved [28]. To facilitate a valid reconstruction, we need to make a trade-off between the m value and the magnitude of pixel-wise noise. In this study, we simply set a threshold μ to restrain the m value: when the maximum pixel-wise MAE exceeded μ, the increase of m was stopped.

Person-dependent vs. Person-independent learning
The proposed approach for CT slice reconstruction is person-independent, which means that the range of each organ used in the proposed range-clip method is an approximation rather than a precise value. This is because we used the pixel values within the organs of one group of persons to predict the range for other different persons. However, based on the proposed approach, we could develop a person-dependent model to achieve more precise ranges for organs by simply changing the types of training data. Both person-independent and persondependent learning methods have their own advantages. In person-independent learning, it saves time because we can use one model to deal with all samples. On contrary, we can achieve more accurate ranges of each organ in person-dependent learning but need more time in training because we need to change training data for each sample. Since person-dependent reconstruction would entail a different research objective from this study, we will pursue this in our future work.

Supervised-learning vs. self-learning
As we mentioned above, the proposed approach is based on U-net, which is a supervised learning approach [2]. It has an encoding and decoding structure, and requires the labeled data for training. We noticed that both the input and output of the proposed networks are from the same CT data. The purpose of the proposed architecture is to let the networks learn the relationships and regulations in the target itself, which is similar to self-learning. Self-learning is an unsupervised-learning, which is widely used in image clustering [29,30]. It exploits the similarity among samples and discrepancy between clusters to improve the clustering performance [30]. In recent years, self-learning was also applied to CT data processing [31][32][33]. For example, Xie et al. proposed a through-plane resolution CT imaging method based on Cycle-GAN [31]. Fung et al. used self-supervised learning models for COVID-19 lung CT segmentation [32]. Niu et al. applied self-learning to reduce the noise of CT data [33]. These selflearning approaches did not need labeled data for training. However, it maybe more difficult for these methods to guarantee the quality of reconstruction than those supervised learning  algorithms whose models can learn directly from the labeled data. Because we have the labeled data for training, we adopted supervised learning in our research. Nevertheless, it will be an interesting research to compare the proposed supervised learning with some self-learning approaches such as CycleGAN [31]. We plan to do this kind of comparisons in our future work.

A comparison with peer work
We compare the proposed approach with previously published works to show a difference.
There was a similar research study in which CT was reconstructed from large to small values of slice thickness. It used 3D CNNs to reconstruct CT data from 3-mm (or 5-mm) to 1-mm slice thickness [17]. There were only two patterns of CT reconstruction: 3-mm to 1-mm, and 5-mm to 1-mm. Unlike that work, the proposed approach improves the longitudinal direction resolution by reconstructing internal slices between pairs of adjacent slices. Our approach is based on 2D reconstruction of multiple slices. The number of target slices to be reconstructed is not fixed, which makes the method more suitable for real applications. Furthermore, the proposed method can be used for either large-to-small slice-thickness reconstruction or largeto-small slice-increment reconstruction as shown in Fig 1.

Overlap issue
In the subsection, we discuss an interesting issue that two original slices overlap. In that case, we cannot reduce the slice increment directly. However, we can reduce the slice thickness first, and then the original overlapped slices can become non-overlap ones. Then we can reduce the slice thickness further or increment to improve the resolution in the longitudinal direction using the proposed approach.

Conclusion
In summary, we proposed a parallel U-net architecture to reconstruct CT slices from neighboring slices. Experimental results demonstrate that this reconstruction is valid, with the Unet architecture playing an important role. Moreover, we proposed a range-clip technique to refine the reconstruction for each specified organ. The proposed approach could be used to reduce the slice thickness or slice increment without any dose increase.